feat/fix(pip): enable local paths for experimental_index_url #3312

adrianimboden · 2025-10-02T01:14:32Z

Hi rules_python team

I use your rules for a long time now. With WORKSPACE style, I use pip as follows:

pip_parse(
    name = "py_deps",
    extra_pip_args = [
        "--index-url=/home/user/local_pip_mirror",
        "--no-cache-dir",
    ],
    python_interpreter = "python3",
    python_interpreter_target = interpreter,
    requirements_lock = ":requirements_lock.txt",
)

The folder /home/user/local_pip_mirror gets populated with pypi-mirror download --requirement requirements_lock.txt

So I have a nice and clean way to use offline build. I could never get it running directly with bazel fetch and stuff. But this solution was very nice because no internet was involved at all.

I am in the way of upgrading to bzlmod. I saw many bug reports for making offline build work, but I honestly gave up with bazel vendor and stuff again. The simplest way in my opinion is to just use a local pip mirror. For that I tried the following:

pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")
pip.parse(         
    experimental_index_url = "/home/user/local_pip_mirror",
    hub_name = "py_deps",
    python_version = python_version,
    requirements_lock = "//:requirements_lock.txt",
)                        
use_repo(pip, "py_deps")

at the moment, experimental_index_url must be a https:// url to work.

This MR changes it so that local paths are also a possible experimental_index_url.

For my project, the proposed changes are in effect and working great.

It is not perfect, but I think it is an important addition to aid the bzlmod migration.

gemini-code-assist · 2025-10-02T01:14:44Z

Summary of Changes

Hello @adrianimboden, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a crucial feature that allows the experimental_index_url parameter within rules_python's pip.parse extension to accept local file system paths, in addition to traditional HTTPS URLs. This enhancement is designed to facilitate robust offline dependency resolution by enabling the use of local pip mirrors, which is particularly beneficial for users migrating to bzlmod and seeking a more flexible and reliable method for managing Python dependencies without requiring internet access during the build process.

Highlights

Local Paths for experimental_index_url: The experimental_index_url parameter in pip.parse now supports local file system paths, allowing users to specify local pip mirrors for package resolution.
Enhanced Offline Build Support for bzlmod: This change significantly improves support for offline builds when using bzlmod by enabling the use of local pip mirrors, addressing a common challenge for users migrating from WORKSPACE to bzlmod.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds a useful feature to enable local paths for pip indexes, which is great for offline builds. However, the current implementation introduces a few critical issues and regressions. I've found a critical regression in simpleapi_download.bzl that breaks fetching from multiple indexes. Additionally, the logic for handling local paths in both modified files has several bugs, including incorrect URL detection, lack of error handling, and a crash-inducing bug in whl_library.bzl due to a missing sha256 field. I've provided detailed comments and suggestions to address these issues, including a significant refactoring for whl_library.bzl to use a more robust approach with file:// URLs.

python/private/pypi/simpleapi_download.bzl

gemini-code-assist · 2025-10-02T01:16:44Z

python/private/pypi/whl_library.bzl

+        if len(urls) == 1 and not urls[0].startswith("https://"):
+            local_url = urls[0]
+            local_path = rctx.path(local_url)
+
+            if local_path.is_dir:
+                rctx.file(filename, rctx.read(local_url + "index.html"))
+                result = struct(
+                    output = filename,
+                    success = True,
+                )
+            else:
+                rctx.file(filename, rctx.read(local_url))
+                result = struct(
+                    output = filename,
+                    success = True,
+                )
+        else:
+            result = rctx.download(
+                url = urls,
+                output = filename,
+                sha256 = rctx.attr.sha256,
+                auth = get_auth(rctx, urls),
+            )


This block for handling local paths has several issues, including incorrect logic for wheel URLs (the is_dir check), a missing sha256 in the result which will cause a crash, a too-broad condition for identifying local paths, and lack of error handling. A much simpler and more robust approach is to convert the local path to a file:// URL and use rctx.download, which correctly handles all these aspects.

download_urls = urls if len(urls) == 1 and not (urls[0].startswith("https://") or urls[0].startswith("http://")): # It's a local path, convert to a file:// URL for ctx.download abs_path = rctx.path(urls[0]).realpath download_urls = ["file://" + abs_path] result = rctx.download( url = download_urls, output = filename, sha256 = rctx.attr.sha256, auth = get_auth(rctx, download_urls), )

gemini-code-assist · 2025-10-02T01:16:44Z

python/private/pypi/simpleapi_download.bzl

+    if not real_url.startswith("https://"):
+        normalized_url = real_url
+        local_path = ctx.path(normalized_url)
+
+        def local_read_index_result():
+            if local_path.is_dir:
+                ctx.file(output, ctx.read(normalized_url + "index.html"))
+                return struct(
+                    output = output,
+                    success = True,
+                )
+            else:
+                ctx.file(output, ctx.read(normalized_url))
+                return struct(
+                    output = output,
+                    success = True,
+                )
+
+        download = struct(wait = lambda: local_read_index_result())


This logic for handling local paths has a few issues:

The condition not real_url.startswith("https://") is too broad. It will incorrectly treat http:// URLs as local paths.

The local_read_index_result function lacks error handling. If ctx.read fails because a file doesn't exist, it will cause the entire rule to fail, which prevents the logic in simpleapi_download from trying other index URLs.

The implementation of local_read_index_result can be simplified to avoid code duplication.

Here is a suggested implementation that addresses these points:

Suggested change

if not real_url.startswith("https://"):

normalized_url = real_url

local_path = ctx.path(normalized_url)

def local_read_index_result():

if local_path.is_dir:

ctx.file(output, ctx.read(normalized_url + "index.html"))

return struct(

output = output,

success = True,

)

else:

ctx.file(output, ctx.read(normalized_url))

return struct(

output = output,

success = True,

)

download = struct(wait = lambda: local_read_index_result())

if not (real_url.startswith("https://") or real_url.startswith("http://")):

normalized_url = real_url

local_path = ctx.path(normalized_url)

def local_read_index_result():

path_to_read = normalized_url

if local_path.is_dir:

path_to_read += "index.html"

if not ctx.path(path_to_read).exists:

return struct(success = False)

ctx.file(output, ctx.read(path_to_read))

return struct(

output = output,

success = True,

)

download = struct(wait = lambda: local_read_index_result())

aignas · 2025-10-02T04:16:34Z

python/private/pypi/whl_library.bzl

-        )
+        if len(urls) == 1 and not urls[0].startswith("https://"):
+            local_url = urls[0]
+            local_path = rctx.path(local_url)


Ideally, the whl_library should consume locally available paths through labels, because then it makes things easier to thread through and whl_library already accepts a label. Would it be possible to create a repository_rule that exposes these paths as labels?

aignas · 2025-10-02T04:17:53Z

This looks interesting - we certainly have discussed about using local path for the experimental_index_url in the past. A few things that pop to my mind:

Having a unit test that ensures that this feature works would be great. They can be added in tests/pypi/simpleapi_download directory.

How do you update the local mirror?

adrianimboden · 2025-10-02T08:33:02Z

I populate the mirror like this:

pypi-mirror download --requirement requirements_lock.txt --download-dir /tmp/download
pypi-mirror create --download-dir /tmp/download --mirror-dir /path/to/mirror --copy

The folder looks like this then:

├── index.html
├── aiohappyeyeballs
│   ├── aiohappyeyeballs-2.6.1-py3-none-any.whl
│   └── index.html
├── aiohttp
│   ├── aiohttp-3.12.15-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
│   └── index.html
├── aiosignal
│   ├── aiosignal-1.4.0-py3-none-any.whl
│   └── index.html

I am not sure about the label stuff. Did you think about something like this?

new_local_repository = use_repo_rule("@bazel_tools//tools/build_defs/repo:local.bzl", "new_local_repository")

new_local_repository(
    name = "pip_deps_mirror",
    build_file_content = "exports_files(['**'])",
    path = "/home/thingdust/deps/pip_deps",
)

pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")
pip.parse(
    experimental_index_url = Label("@pip_deps_mirror"),
    hub_name = "py_deps",
    python_version = python_version,
    requirements_lock = "//:requirements_lock.txt",
)
use_repo(pip, "py_deps")

I am not sure how easy that will be. Seems like a complete new codepath to me when there may be an URL or a label. Or do I miss something?

adrianimboden · 2025-10-02T08:44:01Z

after looking at it again, I saw that there is a much simpler solution to make it better in the meantime.

I wrongly assumed that the download functions don't work for local files. I always had the problem that file:// urls did not work. I found out that this is because the urls get normalized first. A small addition to strip_empty_path_segments makes it work with local urls.

Making it work with labels would be nice tough. Probably for another time?

adrianimboden · 2025-10-02T08:46:27Z

It did not work before because strip_empty_path_segments does the following: file:///path/to/folder -> file://path/to/folder. For local paths, empty segments should not be a problem I presume.

rickeylev · 2025-10-03T03:32:35Z

I really like the idea of being able to point to a local path using a label, for several reasons.

It'd be really convenient for testing our pip integration -- we can easily construct arbitrary index states and have a more end-to-end verification.

It also seems like a really flexible and powerful way for customizing where pip is getting stuff from. You could write a repo rule to make the pip index look however you want, and be populated however you want.

groodt · 2025-10-03T04:14:49Z

Yes, this sort of thing is great. It's often called a "wheel house" and is a very common and useful pattern for offline builds, avoiding sdist in deployment scenarios, etc. Very supportive of this. Tools in a similarish space are: https://github.com/chriskuehl/dumb-pypi

aignas · 2025-10-05T04:45:29Z

Thinking out loud a little bit how this could be designed. This might be a train of thought but I'll just right it out as I think.

The idea of passing in a local path or something sounds good, but so far in bazel I've seen this work only if you pass an absolute path or a label. Hence I thought it would be nice to pass a label.
If one has labels for each whl file, then we can pass them to whl_library whl_file attribute: https://rules-python.readthedocs.io/en/latest/api/rules_python/python/private/pypi/whl_library.html#whl_library.whl_file
This means that the code in parse_requirements.bzl needs to inject those labels in some way.
In the future we may want to write the URLs into the lock file, so if they have absolute file:/// in them, this will not age well, so it is best to treat the local index as one that has the right format.
parse_requirements.bzl is called from hub_builder and gets the get_index_urls function as a parameter. We could have a separate implementation of that that returns labels instead of URLs, however, the label mapping should be present there is some way.
If pip.parse can create a local index repository on the fly (i.e. repository where we can access whls by using a scheme of @local_index_repo_name//<whl_name>:<file_name>.whl). The extension reads the local directory structure and finds all whl files, then creates a repo and passes the whls as a list of paths/labels. The HTML files are only processed in the extension to avoid the circular dependencies in the extension/starlark evaluation.

So to sum up, the files that would need to be touched:

whl_library - stays the same.
hub_builder.bzl - needs some extra handling of a different get_index_urls function. It should handle the case well where the whl (or dist) struct has whl_file but does not have url set.
parse_requirements.bzl - needs some minor fixing to accommodate a more generic getting of the wheels.
local_whl_repo.bzl - a new repository that contains the files.
simpleapi_local.bzl - a new file that handles the traversing the local index.html tree.

There are probably ways to optimize this approach.

adrianimboden requested review from aignas, groodt and rickeylev as code owners October 2, 2025 01:14

gemini-code-assist bot reviewed Oct 2, 2025

View reviewed changes

aignas reviewed Oct 2, 2025

View reviewed changes

adrianimboden force-pushed the allow-local-paths-for-experimental-index-url branch from 104d221 to 4b93661 Compare October 2, 2025 08:40

allow file:// urls in experimental_index_url

f7cee05

adrianimboden force-pushed the allow-local-paths-for-experimental-index-url branch from 4b93661 to f7cee05 Compare October 2, 2025 08:42

Uh oh!

feat/fix(pip): enable local paths for experimental_index_url #3312

Are you sure you want to change the base?

feat/fix(pip): enable local paths for experimental_index_url #3312

Uh oh!

Conversation

adrianimboden commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Oct 2, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

aignas Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

aignas commented Oct 2, 2025

Uh oh!

adrianimboden commented Oct 2, 2025

Uh oh!

adrianimboden commented Oct 2, 2025

Uh oh!

adrianimboden commented Oct 2, 2025

Uh oh!

rickeylev commented Oct 3, 2025

Uh oh!

groodt commented Oct 3, 2025

Uh oh!

aignas commented Oct 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

adrianimboden commented Oct 2, 2025 •

edited

Loading